{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "https://mp.weixin.qq.com/s/hcGintcTQtiuAEinghvSbg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# `ELMo`原理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Embeddings from Language Models`：\n",
    "- **根据上下文动态调整当前词的词向量表示**，如下例中`apple`两种不同的涵义，对应不同的词向量\n",
    "```\n",
    "I like to eat apple\n",
    "I like apple products\n",
    "```\n",
    "\n",
    "`ELMo`采用了典型的两阶段过程：\n",
    "- 利用大量语料，预训练一个语言模型；该语言模型相当于一个**动态词向量生成器**，用于给具体任务生成词向量；\n",
    "- 在进行下游任务时，从第一阶段预训练模型中提取对应单词的网络各层的`Word Embedding`作为新特征补充到下游任务中。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 预训练阶段\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "<img src=\"../images/ELMo.jpg\" width=\"80%\">\n",
    "- 模型采用了双向双层`LSTM`，输入为单词或字符的`embedding`\n",
    "        \n",
    "        \n",
    "- 模型的最底层输入$(E_1,E_2,...,E_N)$，代表一句话中N个词的初始词向量，这个词向量是利用字符卷积`(char-cnn)`得到  \n",
    "- 模型的输出$(T_1,T_2,...,T_N)$，$T_i$代表每个单词出现的概率，即预测位置 i 处的单词为哪一个\n",
    "\n",
    "   \n",
    "- 预训练的目标是根据单词$w_i$的上下文去正确预测单词$w_i$；\n",
    "    - 上图中左边的`LSTM`进行正向的预测，从左到右依次预测句子的单词；给定上文单词$\\{t_1,...,t_{k-1}\\}$，对单词$t_k$的概率建模，计算序列出现的概率\n",
    "$$p(t_1,t_2,...,t_N)=\\prod_{k=1}^{N}p(t_k|t_1,t_2,...,t_{k-1})$$\n",
    "    - 右边的逆向双层`LSTM`代表反方向编码器，输入的是从右到左的句子下文。\n",
    "$$p(t_1,t_2,...,t_N)=\\prod_{k=1}^{N}p(t_k|t_{k+1},t_{k+2},...,t_N)$$\n",
    "    - 模型的目标就是同时最大化前后向语言模型的对数似然\n",
    "$$\\sum_{k=1}^{N}\\Big(\\big(logp(t_k|t_1,t_2,...,t_{k-1})\\big)+\\big(logp(t_k|t_{k+1},t_{k+2},...,t_N)\\big)\\Big)$$\n",
    "\n",
    "- 因此，对于每个输入的`token`$t_k$，`ELMo`利用`L`层的双向`LSTM`将其表示成`2L+1`个向量：原始的输入向量，及每一层的前向`LSTM`输出，和逆向`LSTM`输出，通常会将前向和逆向输出拼接，作为该层的输出向量\n",
    "- 原始输入向量表征单词特征，第一层`LSTM`表征句法特征，第二层`LSTM`表征语义特征；越高层，越能捕获词意信息，越能区分一词多义，表示对词义消歧做的越好。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`ELMo`的变体，前向网络和逆向网络不是分开，而是合并后再两层堆叠起来，且使用残差层，将原始输入向量和第一层的输出合并作为第二层的输入\n",
    " \n",
    "<img src=\"../images/elmo_combination.png\" width=\"80%\" alt=\"elmo\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用语言模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将每一层的双向`LSTM`产生的向量乘以权重参数，作为下游任务的一种特征输入，将该特征与下游任务的词向量进行拼接，构成最终的词向量进行任务的训练。下游任务训练时，冻结`ELMo`的参数\n",
    "$$\\text{EMLo}_k=\\gamma\\sum_{j=0}^{L}s_jh_{k,j}$$\n",
    "$$\\text{INPUT}_k=\\big[E_k,\\text{EMLo}_k\\big]$$\n",
    "其中$s_j$表示每一层输出向量的权重，$h_{k,j}$表示第$k$个单词在第$j$层的输出;\n",
    "\n",
    "双层的`LSTM`获得的词向量：\n",
    "    $$\\text{ELMo}_k^{task} = \\gamma_k \\cdot (s_0^{task}\\cdot x_{k} + s_1^{task}\\cdot h_{1,k} + s_2^{task} \\cdot h_{2,k})$$\n",
    "    \n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 使用`ELMo`语言模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-04-30T09:04:46.886696Z",
     "start_time": "2020-04-30T09:04:46.850306Z"
    }
   },
   "outputs": [
    {
     "ename": "RuntimeError",
     "evalue": "Exporting/importing meta graphs is not supported when eager execution is enabled. No graph exists when eager execution is enabled.",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-7-681eb558a895>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     12\u001b[0m \u001b[0;31m# # array = sess.run(embeddings)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     13\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 14\u001b[0;31m \u001b[0melmo\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mhub\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mModule\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"../../H/tfhub/elmo/\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtrainable\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     15\u001b[0m embeddings = elmo(\n\u001b[1;32m     16\u001b[0m     \u001b[0;34m[\u001b[0m\u001b[0;34m\"the cat is on the mat\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"dogs are in the fog\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow_hub/module.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, spec, trainable, name, tags)\u001b[0m\n\u001b[1;32m    174\u001b[0m           \u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_name\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    175\u001b[0m           \u001b[0mtrainable\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_trainable\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 176\u001b[0;31m           tags=self._tags)\n\u001b[0m\u001b[1;32m    177\u001b[0m       \u001b[0;31m# pylint: enable=protected-access\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    178\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow_hub/native_module.py\u001b[0m in \u001b[0;36m_create_impl\u001b[0;34m(self, name, trainable, tags)\u001b[0m\n\u001b[1;32m    384\u001b[0m         \u001b[0mtrainable\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtrainable\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    385\u001b[0m         \u001b[0mcheckpoint_path\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_checkpoint_variables_path\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 386\u001b[0;31m         name=name)\n\u001b[0m\u001b[1;32m    387\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    388\u001b[0m   \u001b[0;32mdef\u001b[0m \u001b[0m_export\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvariables_saver\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow_hub/native_module.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, spec, meta_graph, trainable, checkpoint_path, name)\u001b[0m\n\u001b[1;32m    443\u001b[0m     \u001b[0;31m# TPU training code.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    444\u001b[0m     \u001b[0;32mwith\u001b[0m \u001b[0mscope_func\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 445\u001b[0;31m       \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_init_state\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    446\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    447\u001b[0m   \u001b[0;32mdef\u001b[0m \u001b[0m_init_state\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow_hub/native_module.py\u001b[0m in \u001b[0;36m_init_state\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m    446\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    447\u001b[0m   \u001b[0;32mdef\u001b[0m \u001b[0m_init_state\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 448\u001b[0;31m     \u001b[0mvariable_tensor_map\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_state_map\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_create_state_graph\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    449\u001b[0m     self._variable_map = recover_partitioned_variable_map(\n\u001b[1;32m    450\u001b[0m         get_node_map_from_tensor_map(variable_tensor_map))\n",
      "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow_hub/native_module.py\u001b[0m in \u001b[0;36m_create_state_graph\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m    503\u001b[0m         \u001b[0mmeta_graph\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    504\u001b[0m         \u001b[0minput_map\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 505\u001b[0;31m         import_scope=relative_scope_name)\n\u001b[0m\u001b[1;32m    506\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    507\u001b[0m     \u001b[0;31m# Build a list from the variable name in the module definition to the actual\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py\u001b[0m in \u001b[0;36mimport_meta_graph\u001b[0;34m(meta_graph_or_file, clear_devices, import_scope, **kwargs)\u001b[0m\n\u001b[1;32m   1451\u001b[0m   return _import_meta_graph_with_return_elements(meta_graph_or_file,\n\u001b[1;32m   1452\u001b[0m                                                  \u001b[0mclear_devices\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mimport_scope\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1453\u001b[0;31m                                                  **kwargs)[0]\n\u001b[0m\u001b[1;32m   1454\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1455\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py\u001b[0m in \u001b[0;36m_import_meta_graph_with_return_elements\u001b[0;34m(meta_graph_or_file, clear_devices, import_scope, return_elements, **kwargs)\u001b[0m\n\u001b[1;32m   1461\u001b[0m   \u001b[0;34m\"\"\"Import MetaGraph, and return both a saver and returned elements.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1462\u001b[0m   \u001b[0;32mif\u001b[0m \u001b[0mcontext\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecuting_eagerly\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1463\u001b[0;31m     raise RuntimeError(\"Exporting/importing meta graphs is not supported when \"\n\u001b[0m\u001b[1;32m   1464\u001b[0m                        \u001b[0;34m\"eager execution is enabled. No graph exists when eager \"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1465\u001b[0m                        \"execution is enabled.\")\n",
      "\u001b[0;31mRuntimeError\u001b[0m: Exporting/importing meta graphs is not supported when eager execution is enabled. No graph exists when eager execution is enabled."
     ]
    }
   ],
   "source": [
    "import tensorflow_hub as hub\n",
    "import tensorflow as tf\n",
    "# # tf.compat.v1.disable_eager_execution()\n",
    "\n",
    "# elmo = hub.load(\"../H/tfhub/elmo/\")\n",
    "# texts = [\"the cat is on the mat\", \"dogs are in the fog\"]\n",
    "# embeddings = elmo(texts, signature='default', as_dict=True)[\"default\"]\n",
    "\n",
    "# # from tensorflow.python.keras import backend as K\n",
    "\n",
    "# # sess = K.get_session()\n",
    "# # array = sess.run(embeddings)\n",
    "\n",
    "elmo = hub.Module(\"../../H/tfhub/elmo/\", trainable=True)\n",
    "embeddings = elmo(\n",
    "    [\"the cat is on the mat\", \"dogs are in the fog\"],\n",
    "    signature=\"default\",\n",
    "    as_dict=True)[\"elmo\"]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "tensorflow_one",
   "language": "python",
   "name": "tensorflow_one"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
